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Abstract 

We address the problem of minimal-change integrity maintenance in the context 
of integrity constraints in relational databases. We assume that integrity- restoration 
actions are limited to tuple deletions. We identify two basic computational issues: re- 
pair checking (is a database instance a repair of a given database?) and consistent 
| ABC99i (is a tuple an answer to a given query in every repair of 



a 



query answers 

given database?). We study the computational complexity of both problems, delineat- 
ing the boundary between the tractable and the intractable. We consider denial con- 
straints, general functional and inclusion dependencies, as well as key and foreign key 
constraints. Our results shed light on the computational feasibility of minimal-change 
integrity maintenance. The tractable cases should lead to practical implementations. 
The intractability results highlight the inherent limitations of any integrity enforce- 
ment mechanism, e.g., triggers or referential constraint actions, as a way of performing 
minimal-change integrity maintenance. 



1 Introduction 

Inconsistency is a common pfienomenon in tlie database world today. Even though integrity 
constraints successfully capture data semantics, the actual data in the database often fails 
to satisfy such constraints. This may happen because the data is drawn from a variety of 
independent sources (as in data integration [ Len02(| ) or is involved in complex, long-running 



activities like workflows. 

How to deal with inconsistent data? The traditional way is not to allow the database to 
become inconsistent by aborting updates or transactions leading to integrity violations. We 
argue that in present-day applications this scenario is becoming increasingly impractical. 
First, if a violation occurs because of data from multiple, independent sources being merged 



|LM96|, there is no single update responsible for the violation. Moreover, the updates have 
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typically already committed. For example, if we know that a person should have a single 
address but multiple data sources contain different addresses for the same person, it is not 
clear how to fix this violation through aborting some update. Second, the data may have 
become inconsistent through the execution of some complex activity and it is no longer 
possible to trace the inconsistency to a specific action. 

In the context of triggers or referential integrity, more sophisticated methods for handling 
integrity violations have been developed. For example, instead of being aborted an update 
may be propagated. In general, the result is at best a consistent database state, typically 
with no guarantees on its distance from the original, inconsistent state (the research reported 
in [LML97] is an exception). 

In our opinion, integrity restoration should be a separate process that is executed after an 
inconsistency is detected. The restoration should have a minimal impact on the database 
by trying to preserve as many tuples as possible. This scenario is called from now on 
minimal- change integrity maintenance. 

One can interpret the postulate of minimal change in several different ways, depending 
on whether the information in the database is assumed to be correct and complete. If the 
information is complete but not necessarily correct (it may violate integrity constraints), the 
only way to fix the database is by deleting some parts of it. If the information is both incor- 
rect and incomplete, then both insertions and deletions should be considered. In this paper 
we focus on the first case. Since we are working in the context of the relational data model, 
we consider tuple deletions. Such a scenario is common in data warehouse applications where 
dirty data coming from many sources is cleaned in order to be used as a part of the ware- 
house itself. On the other hand, in some data integration approaches, e.g.,[Len02, LLR02|, 
the completeness assumption is not made. For large classes of constraints, e.g., denial con- 
straints, the restriction to deletions has no impact, since only deletions can remove integrity 
violations. We return to the issue of minimal change in Section 

We claim that a central notion in the context of integrity restoration is that of a repair 
|ABC99|. A repair is a database instance that satisfies integrity constraints and minimally 
differs from the original database (which may be inconsistent). Because we consider only 
tuple deletions as ways to restore database consistency, the repairs in our framework are 
subsets of the original database instance. 

The basic computational problem in this context is repair checking, namely checking 
whether a given database instance is a repair of of the original database. The complexity 
of this problem is studied in the present paper. The PTIME algorithms for repair checking 
given here can be easily adapted to non-deterministically compute repairs (as we show). 

Sometimes when the data is obtained online from multiple, autonomous sources, it is 
not possible to restore the consistency. In that case one has to settle for computing, in 
response to queries, consistent query answers ABC99(| , namely answers that are true in 
every repair of the given database. Such answers constitute a conservative "lower bound" 
on the information present in the database. The problem of computing consistent query 
answers is the second computational problem studied in the present paper. We note that 
the notion of consistent query answer proposed in |ABC99| has been used and extended, 
among others, in jABCOq , |GGZ01| , |LLR02| , |ABC+03i |Wij03|| . However, none of these papers 
presents a comprehensive and complete computational complexity picture. 

We describe now the setting of our results. We analyze the computational complexity 
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of repair checking and consistent query answers along several different dimensions. We 
characterize the impact of the following parameters: 



the class of queries: quantifier-free queries, conjunctive queries, and simple conjunc- 
tive queries (conjunctive queries without repeated relation symbols). 

the class of integrity constraints: denial constraints, functional dependencies (FDs), 
inclusion dependencies (INDs), and FDs and INDs together. We also consider practi- 
cally important subclasses of FDs and INDs: key functional dependencies and foreign 
fcey constraints |Dat81|. 



• the number of integrity constraints. 

As a result we obtain several new classes for which both repair checking and consistent 
query answers are in PTIME: 

• queries: ground quantifier-free, constraints: arbitrary denial; 

• queries: closed simple conjunctive, constraints: functional dependencies (at most one 
FD per relation); 

• queries: ground quantifier-free or closed simple conjunctive, constraints: key func- 
tional dependencies and foreign key constraints, with at most one key per relation. 

Additionally, we show that repair checking (but not consistent query answers) are in PTIME 
for arbitrary FDs and acyclic INDs. The results obtained are tight in the sense that relaxing 
any of the above restrictions leads to co-NP-hard problems, as we prove. (This, of course, 
does not preclude the possibility that introducing additional, orthogonal restrictions could 
lead to more PTIME cases.) To complete the picture, we show that for arbitrary sets of FDs 
and INDs repair checking is co-NP-complete and consistent query answers is Ilg-complete. 

Our results shed light on the computational feasibility of minimal-change integrity main- 
tenance. The tractable cases should lead to practical implementations. The intractability 
results highlight the inherent limitations of any integrity enforcement mechanism, e.g., trig- 



gers or referential constraint actions |MS02, LML97|, as ways of performing minimal-change 



integrity maintenance using tuple deletions. 

The plan of the paper is as follows. In SectioriB, we define the basic concepts. In 
Section ^, we consider denial constraints. In Section ^ we discuss INDs together with FDs. 
In Section ^, we summarize related research and in Section 0^we draw conclusions and 



discuss future work. An earlier version of the results in Section was presented in [CM02|. 



2 Basic Notions 



In the following we assume we have a fixed relational database schema R consisting of 
a finite set of relations. We also have a fixed, infinite database domain D, consisting 
of uninterpreted constants, and a numeric domain N. Those domains are disjoint. The 
database instances can be seen as finite, first-order structures over the given schema, that 
share the domain D. Every attribute in U is typed, thus all the instances of R can contain 
only elements either of or of in a single attribute. Since each instance is finite, it 
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has a finite active domain which is a subset of D U N. As usual, we allow the standard 
built-in predicates over N (=,7^, <,>,<,>) that have infinite, fixed extensions. With all 
these elements we can build a first order language £. 

2.1 Integrity Constraints 

Integrity constraints are closed first-order ^-formulas. In the sequel we will denote relation 
symbols by Pi, ... , Pm, tuples of variables and constants by xi, . . . , Xm, and a conjunction 

of atomic formulas referring to built-in predicates by (p. 

In this paper we consider the following basic classes of integrity constraints: 

1. Denial constraints: ^-sentences 

Vxi, ...Xk- -'[Pi(xi) A • • • APm{Xm) A ip{xi, . . .,Xm)]- 

2. Functional dependencies (FDs): >C-sentences 

[P{Xi,X2,X^) A P(xi,X3,X5) ^X2= X3], 

where the Xi are sequences of distinct variables. A more familiar formulation of the 
above FD \s X ^ Y where X is the set of attributes of P corresponding to xi, 
and Y the set of attributes of P corresponding to X2 (and x^). Clearly, functional 
dependencies are a special case of denial constraints. 

3. Inclusion dependencies (INDs): >C-sentences 

Vxi 3X3. [Q{xi) =^ P(X2,X3)], 

where the Xj are sequences of distinct variables with X2 contained in xi, and P,Q 
database relations. Again, this is often written as Q\Y\ C P[X\ where X (resp. Y) is 
the set of attributes of P (resp. Q) corresponding to X2. If P and Q are clear from 
the context, we omit them and write the dependency simply as y C X. Full inclusion 
dependencies are those expressible without the existential quantifiers. 

Given a set of FDs and INDs IC over a relation P and X which is a key of P w.r.t. /C, 
we say that each FD X Y £ IC is a key dependency and each IND Q\Y] C P[X] G IC 
is a foreign key constraint. If, additionally, X is the primary key of P, then both kinds of 
dependencies are termed primary. 

Definition 1 Given a database instance r of R and a set of integrity constraints IC , we say 
that r is consistent if r \= IC in the standard model-theoretic sense; inconsistent otherwise. 
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2.2 Repairs 

Given a database instance r, the set S(r) of facts of r is the set of ground atomic formulas 
{-P(a) I r \= P{a)}, where P is a relation name and a a ground tuple. 

Definition 2 The distance A~(r, r') between data-base instances r and r' is defined as 
A-(r,r') = (S(r) - S(r')). ■ 

Definition 3 For the instances r,r',r" , r' r" if A^{r,r') C A^(r,r"), i.e., if the 
distance between r and r' is less than or equal to the distance between r and r" . ■ 

Definition 4 Given a set of integrity constraints IC and database instances r and r' , we 
say that r' is a repair ofr w.r.t. IC ifr' \= IC and r' is <r-minimal in the class of database 
instances that satisfy IC. ■ 

If r' is a repair of r, then S(r') is a maximal consistent subset of S(r). We denote 
by Repairs ju{r) the set of repairs of r w.r.t. IC. This set is nonempty, since the empty 
database instance satisfies every set of FDs and INDs. 

2.3 Queries 

Queries are formulas over the same language C as the integrity constraints. A query is 
closed (or a sentence) if it has no free variables. A closed query without quantifiers is also 
called ground. Conjunctive queries [|CM7^ , [AHV95I are queries of the form 



[Pl{xi) A • • • A Pm(Xm) A . . .,Xm)]- 

If a conjunctive query has no repeated relation symbols, it is called simple. 
The following definition is standard: 

Definition 5 A tuple t is an answer to a query Q{x) in an instance r iff r \= Qit). ■ 
2.4 Consistent query answers 

Given a query Q{x) to r, we want as consistent answers those tuples that are unaffected 
by the violations of /C, even when r violates IC . 



Definition 6 ^BC9dll A tuple t is a consistent answer to a query Q{x) in a database 



instance r w.r.t. a set of integrity constraints IC iffi is an answer to query Q{x) in every 
repair r' ofr w.r.t. IC. An C-sentence Q is consistently true in r w.r.t. IC if it is true in 
every repair of r w.r.t. IC. In symbols: 

r \=ic Qit) ■^=^ r' \= Q(t) for every repair r of r w.r.t. IC. 



Note: If the set of integrity constraints IC is clear from the context, we omit it for 
simplicity. 
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2.5 Examples 

Example 1 Consider the following instance of a relation Person 



Name 


City 


Street 


Brown 


Amherst 


115 Klein 


Brown 


Amherst 


120 Maple 


Green 


Clarence 


4000 Transit 



and the functional dependency Name City Street. Clearly, the above instance does not 
satisfy the dependency. There are two repairs: one is obtained by removing the first tuple, 
the other by removing the second. The consistent answer to the query Person(n, c, s) is just 
the tuple (Green,Clarence,4000 Transit). On the other hand, the query 3s[Person{n, c, s)] 
has two consistent answers: (Brown,Amherst) and (Green, Clarence) . Similarly, the query 

Person (Brown, Amherst, 115 Klein) V Person (Brown, Amherst, 120 Maple) 

has true as the consistent answer. Notice that for the last two queries the approach based 
on removing all inconsistent tuples and evaluating the original query using the remaining 
tuples gives different, less informative results. 

Example 2 Consider a database with two relations Employee(SSN ,Name) and Manager(SSN). 
There are functional dependencies SSN — Name and Name SSN , and an inclusion 
dependency Manager[SSN] C Employee[SSN]. The relations have the following instances: 



Manager 



Employee 


SSN 


Name 


123456789 


Smith 


555555555 


Jones 


555555555 


Smith 



SSN 



123456789 
555555555 



The instances do not violate the IND but violate both FDs. If we consider only the FDs, 
there are two repairs: one obtained by removing the third tuple from Employee, and the 
other by removing the first two tuples from the same relation. However, the second repair 
violates the IND. This can be fixed by removing the first tuple from Manager. So if we 
consider all the constraints, there are two repairs: 



Employee 



Manager 



and 



SSN 


Name 


123456789 


Smith 


555555555 


Jones 


Employee 


SSN 


Name 


555555555 


Smith 



SSN 



123456789 
555555555 



Manager 
SSN 

555555555 
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Example 3 We give here some examples of denial constraints. Consider the relation Emp 
with attributes Name, Salary, and Manager, with Name being the primary key. The con- 
straint that no employee can have a salary greater that that of her manager is a denial 
constraint: 

Vn, s, m, s' , m'. ^[Emp{n, s, m) A Emp{m, s' , m') /\ s > s']. 

Similarly, single-tuple constraints ('CHECK constraints in SQL2) are a special case of denial 
constraints. For example, the constraint that no employee can have a salary over $200000 
is expressed as: 

yn,s,m. ^[Emp{n, s,m) As > 200000]. 

Note that a single-tuple constraint always leads to a single repair which consists of all the 
tuples of the original instance that satisfy the constraint. 



2.6 Different notions of repair 



The original notion of repair introduced in |ABC99] required that the symmetric differ 



ence between a database and its repair be minimized. As explained in the introduction, 
this was based on the assumption that the database may be not only inconsistent but also 
incomplete. The notion of repair pursued in the current paper (Definition ^) reflects the 
assumption that the database is complete. There are several reasons for this change of per- 
spective. First, for denial constraints integrity violations can only be removed by deleting 
tuples, so the different notions of repair in fact coincide in this case. Therefore, all the 
results presented in Section |3| are not affected by the restriction of the repairs to be subsets 
of the original instance. Insertions can restore integrity only for inclusion dependencies (or, 
in general for tuple-generating dependencies |[AHV95| ) . Second, even for inclusion depen- 
dencies current language standards like SQL: 1999 allow only deletions in their repertoire 
of referential integrity actions. Third, disallowing insertions significantly strengthens the 
notion of consistent query answer, as demonstrated by the following example. 

Example 4 Consider a database schema consisting of two relations P{AB) and S{C). The 
integrity constraints are: the FD A ^ B and the IND B Q C. Assume the database instance 
ri consists of p = {(a, 6), (a, c)} and s = {b}. Then under Definition there is only one 
repair r2 consisting of p' = {(o, 6)} and s' = s. On the other hand, under the definition of 
ABC9^J , there is one more repair r^ consisting of p" = {(o, c)} and s" = {6, c}. Therefore, 



in the first case P{a,b) is consistently true in the original instance ri, while in the second 
case it is not. Note that P{a, c) is not consistently true in ri either. Thus, in the second 
case P{a, b) and P{a, c) are treated symmetrically from the point of view of consistent query 
answering. However, intuitively there is a difference between them. Think of A being the 
person's name, B her address and S a list of valid addresses. Then only under Definition 
^ would the single valid address be returned as a consistent answer. 

Finally, insertions may lead to infinitely many repairs which are, moreover, not very intuitive 
as ways of fixing an inconsistent database. 

Example 5 In Example allowing insertions gives additionally infinitely many repairs of 
the form 
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Employee 



Manager 



SSN 


Name 


123456789 


c 


555555555 


Smith 



SSN 



123456789 
555555555 



where c is an arbitrary string different from Smith. 
2.7 Computational Problems 

Assume a class of databases V, a class of queries Q and a class of integrity constraints C 
are given. We study here the complexity of the following problems: 

• repair checking, i.e., the complexity of the set 

Bjc = {{r, r') : r,r' /\r' ^ Repairs jQ^r)}, 

• consistent query answers, i.e., the complexity of the set 

Dicf = {r:reVAr \=ic ^>}, 

for a fixed sentence <^ G Q and a fixed finite set IC G C of integrity constraints. This 
formulation is called data complexity [CH80| , Var82], since it captures the complexity of a 
problem as a function of the number of tuples in the database instance only. The database 
schema, the query and the integrity constraints are assumed to be fixed. 

It is easy to see that even under a single key FD, there may be exponentially many 
repairs and thus the approach to computing consistent query answers by generating and 
examining all repairs is not feasible. 

Example 6 Consider the functional dependency A ^ B and the following family of relation 
instances r„, n > 0, each of which has 2n tuples (represented as columns) and 2" repairs: 







A 


ai 


ai 


02 


02 




B 


bo 


bi 


bo 


bi ■ 


• • bo bi 



We establish below a general relationship between the problems of repair checking and 
consistent query answers. 

Theorem 1 In the presence of foreign key constraints, the problem of repair checking is 
log space-reducible to the complement of the problem of consistent query answers. 

Proof. We discuss here the case of the database consisting of a single relation Rq. Assume r 
is the given instance of Rq and r' is an another instance of Ro satisfying the set of integrity 
constraints IC. We define a new relation So having the same attributes as Rq plus an 
additional attribute Z. Consider an instance s of So built as follows: 

• for every tuple {xi, . . . , Xk) € r', we add the tuple {xi, . . . , Xk, ci) to s; 

• for every tuple {xi, . . . , Xk) & r — r' , we add the tuple {xi, . . . , x^, 02) to s. 

Consider also another relation P having a single attribute W , and a foreign key constraint 
io ■ P[W] C 5o[.^]. The instance p of P consists of a single tuple C2. We claim that P{c2) 
is consistently true in the database instance consisting of s and p w.r.t. IC U {io} iff r' is 
not a repair of r w.r.t. IC. ■ 
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3 Denial constraints 
3.1 Conflict hypergraph 

Given a set of denial constraints F and an instance r, all the repairs of r with respect to 
F can be succinctly represented as the conflict hypergraph. This is a generalization of the 



conflict graph defined in |ABC01| for FDs only. 



Definition 7 The conflict hypergraph Qp^^ is a hypergraph whose set of vertices is the set 
^i''') of facts of an instance r and whose set of edges consists of all the sets 

{p,{ti),P2{i2),---Pim 

such that Pi(ii), ^2(^2)) • • • Pl{U) ^ ^(^); '^^^ there is a constraint 

Vxi,X2, ...xi. -.[Pi(xi) A P2(:^2) A ... A Pi{xi) A ip{xi,X2, . ..Xi)] 

in F such that Pi(ti), P2{i2), ■ ■ ■ Pi(ti) violate together this constraint, which means that 
there exists a substitution p such that p{xi) = ti,p{x2) = t2,---p{xi) = ti and that 
ip{ti,t2, ■ ■ - ii) is true. 

Note that there may be edges in Qp^r that contain only one vertex. Also, the size of the 
conflict hypergraph is polynomial in the number of tuples in the database instance. 

By an independent set in a hypergraph we mean a subset of its set of vertices which 
does not contain any edge. 

Proposition 1 Each repair of r w.r.t. F corresponds to a maximal independent set in 

GF,r- 

Proposition ffl yields the following result: 



Proposition 2 ^ABC^ OS ] For every set of denial constraints F and C-sentence ^, Bp is 



in PTIME and Dp^c^ is in co-NP. ■ 

Note that the repairs of an instance r can be computed nondeterministically by picking 
a vertex of Qp^r which does not belong to a single- vertex edge and adding vertices that do 
not result in the addition of an entire edge. 

3.2 Positive results 



A set of constraints is generic if it does not imply any ground literal. The results in | ABC99|| 
imply the following: 

Proposition 3 For every generic set F of binary denial constraints and full inclusion 
dependencies, and quantifier-free C-sentence 

$ = Pi{xi) A • ■■Pm{Xm) A ^Pm+l{Xm+l) A • • • A -■P„(S„) A if{xi, . . . ,X„), 

Dpi^ is in PTIME. ■ 
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The techniques in |ABC99| do not generahze to non-binary constraints, or queries in- 
volving disjunction or quantifiers. However, non-binary constraints and disjunctions do not 
necessarily lead to intractability, as shown by the following theorem. 



Theorem 2 For every set F of denial constraints and quantifier-free C-sentence Dp^f^ 
is in PTIME. ■ 

Proof. We assume the sentence is in CNF, i.e., of the form <5 = <l>i A $2 A . . . where each 
is a disjunction of ground literals. $ is true in every repair of r if and only if each of the 

clauses is true in every repair. So it is enough to provide a polynomial algorithm which 

will check if a given ground clause is consistently true. 

It is easier to think that we are checking if a ground clause true is not consistently true. 

This means that we are checking, whether there exists a repair r' in which -i<I>j is true for 

some i. But is of the form Pi(ti) A ^2(^2) A . . . A Pm(im) A A . . . A 

where the fj's are tuples of constants. WLOG, we assume that all the facts in the set 

{Pi(ti), . . . , are mutually distinct. 

The nonderministic algorithm selects for every j, m+\ 1^ j 1^ n, tj G r, an edge Ej £ Qp^r 

such that ij G Ej. Additionally the following global condition needs to be satisfied: there 

is no edge E £ Qp^r such that E (^r' where 

m+l<j<n,tj €r 

If the selection succeeds, then a repair in which -i<I>j is true can be built by adding to 
r' new tuples from r until the set is maximal independent. The algorithm needs n — m 
nondeterministic steps, a number which is independent of the size of the database (but 
dependent on <I>), and in each of its nondeterministic steps selects one possibility from a 
set whose size is polynomial in the size of the database. So there is an equivalent PTIME 
deterministic algorithm. ■ 

In the case when the set F of integrity constraints consists of only one FD per relation 
the conflict hypergraph has a very simple form. It is a disjoint union of full multipartite 
graphs. If this single dependency is a key dependency then the conflict graph is a union of 
disjoint cliques. Because of this very simple structure we hoped that it would be possible, in 
such a situation, to compute in polynomial time the consistent answers not only to ground 
queries, but also to all conjunctive queries. As we are going to see now, this is only possibly 
if the conjunctive queries are suitably restricted. 

Theorem 3 Let F be a set of FDs, each dependency over a different relation among 
Pi, P2, . . . , Pk- Then for each closed simple conjunctive query Q, there exists a sentence 
Q' such that for every database instance r, r \=p Q iff r \= Q' . Consequently, Dp q is in 
PTIME. 

Proof. We present the construction for k = 2 for simplicity; the generalization to an 
arbitrary k is straightforward. Let Pi and P2 be two different relations of arity ki and k2, 
resp. Assume we have the following FDs: Yi Zi over Pi and Y2 Z2 over P2- Let yi 
be a vector of arity lYil, 7/2 a vector of arity 1121, ^1 and z'l vectors of arity \Zi\, and Z2 
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and z'2 vectors of arity IZ2I. Finally, let idi,w'i,iB'( (resp. W2,W2,W2) be vectors of arity 
ki — lYil — \Zi\ (rcsp. ^2 — 1^2! — |-^2|)- AH of the above vectors consist of distinct variables. 
The query Q is of the following form 

3yi,zi,iBi,y2,Z2,W2. [Pi{yi,zi,iBi) A ^2(^2, ^2, 11)2) A ip{yi,zi,wi,y2,Z2,W2)]. 
Then, the query Q' is as follows: 

3yi,Zi,Wi,y2,Z2,W2yz[,w[,z!2,W23w'(,W2^^ /\ P2{y2,Z2,W2) A ip{yi,Zi,Wi,y2,Z2,W2) 

AiPiiyi,z[,w[) A P2{y2,z'2,w'2) =^ AP2(y2,4>«^20 A V7(yl,z;,^^y2,^2>^2))]• 

■ 

We show now that the above results are the strongest possible, since relaxing any of the 
restrictions leads to co-NP-completeness. This is the case even though we limit ourselves 
to key FDs. 

3.3 One key dependency, nonsimple conjunctive query 
Theorem 4 There exist a key FD f and a closed conjunctive query 

Q = 3x,y,z. [R{x, y, c) A R{z, y, c')] , 

for which Dy^ Q is co-NP-complete. 

Proof. Reduction from MONOTONE 3-SAT. The FD is vl ^ BC. Let $ = 0i A A 
V'm+i ■ ■ ■ ^ tpi be a conjunction of clauses, such that all occurrences of variables in 0, are 
positive and all occurrences of variables in ■^j are negative. We build a database with 
the facts R{i,p,c) if the variable p occurs in the clause V'i and R{i,p,c') if the variable p 
occurs in the clause (pi. Now, there is an assignment which satisfies <1> if and only if there 
exists a repair of the database in which Q is false. To show the =^ implication, select for 
each clause (pi one variable pi which occurs in this clause and whose value is 1 and for 
each clause ^pi one variable pi which occurs in ^pi and whose value is 0. The set of facts 
{R{i,Pi, c) : i < m} U {R{i,pi,c') : m + 1 < i < 1} is a repair in which the query Q is false. 
The implication is even simpler. ■ 

3.4 Two key dependencies, single-atom query 

By a bipartite edge-colored graph we mean a tuple Q = (V, E, B, G) such that (F, E) is an 
undirected bipartite graph and E = B U G for some given disjoint sets B, G (so we think 
that each of the edges of Q has one of the two colors) . 

Definition 8 Let Q = {V, E, B, G) he a bipartite edge-colored graph, and let M C E. We 
say that M is maximal V-free if: 

1. M is a maximal (w.r.t. inclusion) subset of E with the property that neither M(x, y) A 
M{x, z) nor M{x, y) A M{z, y) holds for any x, y, z. 

2. MnB = $. 
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We say that Q has the max-V -free property if there exists M which is maximal V-free. 

Lemma 1 Max-V-free is an NP-complete property of bipartite edge-colored graphs. 

Proof. Reduction from 3-COLORABILITY. Let 7i = {U, D) be some undirected graph. 
This is how we define the bipartite edge-colored graph Qj^: 

1. V = {ve,v'^ : V U,£ (z {m,n,r, g,b}}, which means that there are 10 nodes in the 
graph Q for each node of H; 

2. G{vm,Vr),G{vm,vl),G{vn,vl),G{vn,Vg) and G{vr,v'^),G{vb,v'^),G{vb,Vn),G{vg,Vn) hold 
for each v £ U; 

3. B{v^,v'^) holds for each v £ U and each pair e, e G {r, g, b} such that e ^ e; 

4. B{vs,u'^) holds for each e € {r,g,b} and each pair u,v (zU such that D{u,v). 

Suppose that TC is 3-colorable. We fix a coloring of Ti and construct the set M. For each 
V if the color of v is Red, then the edges G{vm,v'f^),G{vn,v'g) and G{vi), v!^) , G{vg , v'^) 
are in M. If color of v is Green, then the edges G{vm,v'j.), G{vn,v'^) and G{vr,v'^), G{vb,v'^) 
are in M, and if the color of v is Blue, then the edges G{vm,v'r), G{vn, v'g) and G{vr,v'^), G{vg,v'n) 
are in M. It is easy to see that the set M constructed in this way is maximal V-free. 

For the other direction, suppose that a maximal V-free set M exists in Q-^. Then, for 
each V £ U there is at least one node among Vr, Vg,Vb which does not belong to any G-edge 
in M. Let be this node. Also, there is at least one such node (say, v'^) among Vj.,Vg,v'^. 
Now, it follows easily from the construction of Gh that if M is maximal V-free then e = e. 
Let this e be color of v in Q. It is easy to check that the coloring defined in this way is a 
legal 3-coloring of ^. ■ 

Theorem 5 There is a set F of two key dependencies and a closed conjunctive query 
Q = 3x,y. [R{x,y,b)], for which Dp^q is co-NP- complete. 

Proof. The 2 dependencies are A — > BC and B — > AG. For a given bipartite edge-colored 
graph Q = {V,E,B,G) we build a database with the tuples {x,y,g) if G{x,y) holds in Q 
and (x, y, b) if B{x, y) holds in Q. Now the theorem follows from Lemma |l| since a repair in 
which the query Q is not true exists if and only if Q has the max- V-free property. ■ 

3.5 One denial constraint 

By an edge-colored graph we mean a tuple Q = (y, E, P, G, B) such that (V, E) is a (directed) 
graph and E = PUGUB for some given pairwise disjoint sets P, G, B (which we interpret as 
colors). We say that the edge colored graph Q has the y property if there are x,y,z,t G E 
such that E{x,y), E(y, z), E{y,t) hold and the edges E{y,z) and E{y,t) are of different 
colors. 

Definition 9 We say that the edge-colored graph {V^ E, P, G, B) has the max-y-free prop- 
erty if there exists a subset M of E such that M n -P = and : 
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1. {V, M,PnM,Gn M, BnM) does not have the y -property; 



2. M is a maximal (w.r.t. inclusion) subset of E satisfying the first condition; 

Lemma 2 Max-y-free is an NP-complete property of edge-colored graphs. 

Proof. By a reduction of 3SAT. Let ^ = (pi f\ /\ . . . f\ 4>i he conjunction of clauses. Let 
Pi,P2, ■ ■ - Pn be all the variables in <I>. This is how we define the edge-colored graph Q^: 

\. V = {ai,bi,Ci,di : 1 < i < n} U {ei,fi,gi :!<«</}, which means that there are 3 
nodes in the new graph for each clause in <I> and 4 nodes for each variable. 

2. P(aj,6j) and P{ej,fj) hold for each suitable i,j; 

3. G{bi,di) and G{ej,gj) hold for each suitable 

4. B{bi,Ci) holds for each suitable i; 

5. G{di, Cj) holds if pi occurs positively in cpj] 

6. B{di, Cj) holds if pi occurs negatively in (pj; 

7. E = BUGU P. 

Now suppose that <1? is satisfiable, and that n is the satisfying assignment. We define 
the set M C E as follows. We keep in M all the G-colored edges from item 3 above. If 
n{pi) = 1 then we keep in M all the G edges leaving di (item 5). Otherwise we keep in M 
all the B edges leaving di (item 6). Obviously, M fl P = 0. It is also easy to see that M 
does not have the 3^-property and that it is maximal. 

In the opposite direction, notice that if an M, as in Definition |9| does exist, then it must 
contain all the G-edges from item 2 above - otherwise a P edge could be added without 
leading to the 3^-property. But this means that, for each i, M can either contain some (or 
all) of the -B-edges leaving di or some (or all) of the G-edges. In this sense M defines a 
valuation of variables. Also, if M is maximal, it must contain, for each j, at least one edge 
leading to ej. But this means that the defined valuation satisfies ■ 

Theorem 6 There exist a denial constraint f and a closed conjunctive query 

Q = 3x,y. [R{x,y,p)], 

for which D{f},Q is co-NP- complete. 

Proof. The denial constraint / is: 

Vx, y, z, s, s', s" ^[R{x, y, s) A R{y, z.s') A R{y, w, s") A s V «"] 

For a given edge-colored graph Q = {V, E, P, G, B) we build a database with the tuples 
R{x,y,g) if G{x,y) holds in Q, with R{x,y,p) if P{x,y) holds in G and with R{x,y,b) if 
B{x,y) holds in Q. Now the theorem follows from Lemma ^ since a repair in which the 
query Q is not true exists iff Q has the max-J^-free property. ■ 
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4 Inclusion dependencies 



Proposition 4 For every set of INDs I and C-sentence Bj and Dj^^ are in PTIME. 

Proof. For a given database instance r, a single repair is obtained by deleting all the tuples 
violating / (and only those). ■ 
We consider now FDs and INDs together. 



4.1 Single- key relations 

We want to identify here the cases where both repair checking and computing consistent 
query answers can be done in PTIME. The intuition is to limit the interaction between the 
FDs and the INDs in the given set of integrity constraints in such a way that one can use 
the PTIME results obtained for FDs in the previous section and in | ABC^03 |. 



Lemma 3 Let IC = F U I be a set of constraints consisting of a set of key FDs F and 
a set of foreign key constraints I but with no more than one key per relation. Let r be a 
database instance and r' be the unique repair of r with respect to the foreign key constraints 
in I. Then r" is a repair of r w.r.t. IC if and only if it is a repair of r' w.r.t. F . 

Proof. The only thing to be noticed here is that repairing r' with respect to key constraints 
does not lead to new inclusion violations. This is because the set of key values in each 
relation remains unchanged after such a repair (which is not necessarily the case if we have 
relations with more than one key). ■ 

Corollary 1 Under the assumptions of Lemma |^ Bjc is in PTIME. 

Proof. Follows from Proposition ^. ■ 
The repairs w.r.t. IC = FUl of r are computed by (deterministically) repairing r w.r.t. 
/ and then nondeterministically repairing the result w.r.t. F (as described in the previous 
section). 

We can also transfer the PTIME results about consistent query answers obtained for 
FDs only. 

Corollary 2 Let ^ a quantifier-free C-sentence or a simple conjunctive closed C-query. 
Then under the assumptions of Lemma ^ Djc^^ is in PTIME. 

Proof. From Theorem |2| and Theorem ^. ■ 
Unfortunately, the cases identified above are the only ones we know of in which both 
repair checking and consistent query answers are in PTIME. 



4.2 Acyclic inclusion dependencies 

For acyclic INDs (and arbitrary FDs), the repair checking problem is still in PTIME. Sur- 
prisingly, consistent query answers becomes in this case a co-NP-hard problem, even in the 
case of key FDs and primary key foreign key constraints. If we relax any of the assump- 
tions of Lemma |3|, the problem of consistent query answers becomes intractable, even under 
acyclicity. 
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Definition 10 [AHV9t] Let I be a set of INDs over a database schema R. Consider a 
directed graph whose vertices are relations from R and such that there is an edge E{P, R) in 
the graph if and only if there is an IND of the form P[X] C R[Y] in I. A set of inclusion 
dependencies is acyclic if the above graph does not have a cycle. ■ 

Theorem 7 Let LC = F L) I be a set of constraints consisting of a set of FDs F and an 
acyclic set of INDs I. Then Bjc is in PTIME. 

Proof. First compare r and r' on relations which are not on the left-hand side of any IND 
in /. Here, r' is a repair if and only if the functional dependencies are satisfied in r' and if 
adding to it any additional tuple from r would violate one of the functional dependencies. 
Then consider relations which are on the left-hand side of some INDs, but the inclusions 
only lead to already checked relations. Again, r' is a repair of those relations if and only if 
adding any new tuple (i.e. any tuple from r but not from r') would violate some constraints. 
Repeat the last step until all the relations are checked. ■ 
The above proof yields a nondeterministic PTIME procedure for computing the repairs 
w.r.t. IC = FUI. 

To our surprise. Theorem |^ is the strongest possible positive result. The problem of 
consistent query answers is already intractable, even under additional restrictions on the 
FDs and INDs. To see this let us start by establishing NP-completeness of the maximal 
spoiled-free problem. 

By an instance of the maximal spoiled-free problem we will mean Q = (V, Vi , V2 , V3 , 5*, i?) 
such that: 

1. {V,E) is a ternary undirected hypergraph (so V is a set of vertices and is a set of 
triangles) ; 

2. Vi, V2, V3 are pairwise disjoint; 

3. U ^2 U ^3 = V; 

4. Relation E is typed: if E{a, b, c) holds in Q then a G Vi, 6 € V2 and c € V3; 

5. 5" C Vi {S will be called set of .spoiled vertices). 

We will consider maximal (with respect to inclusion) sets of disjoint triangles in Q. We 
call a triangle spoiled if one of its vertices is spoiled. The maximal spoiled-free problem is 
defined as the problem of deciding, for a given instance Q = {V,Vi,V2,V3, S, E) , if there 
exists a maximal set T C E of disjoint triangles, such that none of the triangles in T is 
spoiled. It is easy to get confused here, so let us explain that the problem we are considering 
here is not the existence of a set of disjoint triangles, which would be maximal in the class 
of sets not containing a spoiled triangle: such a set of course always exists. The problem 
we consider is the existence of a set of disjoint triangles in Q which is not only maximal but 
also does not contain a spoiled triangle. 

Lemma 4 The maximal spoiled-free problem is NP-complete. 
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Proof. By a reduction of 3-colorability. Let H = {U, D) be some undirected graph. We 
are going to construct the instance of the maximal spoiled- free problem Q-f^. The construc- 
tion is a little bit complicated, and we hope to simplify the presentation by the following 
convention: 

Each vertex in V\ belongs to exactly one triangle in E. So a triangle is fully specified 
by its vertex in V2, its vertex in V3 and by the information if it is spoiled or not. 

Now, for each vertex w in [/ we will have vertices Vr,Vg,vi,,Vp,Vq in V2 and vertices 
v'j., v'g, f^, v'p, v'q in V3. The only nonspoiled triangles will be the defined by the following pairs: 
[vr,v'p], [vg,v'p], [vg,v'g], [vb,v'g], [vp,v'j.], [vp,v'g], [vq,v'g], [vg,vl] (so we havc 8 nonspoiled 
triangles for each vertex in U). 

There are two kinds of spoiled triangles. For each v € U, and for each pair e, £ G {r, g, 6} 
such that e 7^ £ there is a spoiled triangle [f^, v'^] in Q. For each v,u eU, such that D{v, u) 
holds in H, and for each e G {r,g,b} there is a spoiled triangle [ve,«e] in Q. 

Now we need to show that Ti. is 3-colorable if and only if there exists a maximal set 
T C E oi disjoint triangles, such that none of the triangles in T is spoiled. 

Let us start from the ^ direction, which is simple. Consider a coloring of H with colors 
r, g and b. Now take T as a set containing, for each vertex v of H with some color e, all 
nonspoiled triangles of the form where neither a nor (3 equals to e. Obviously, T 

defined in this way, does not contain spoiled triangles. A simple analysis shows that it is 
also maximal. 

For the other direction suppose that there is a set T of disjoint triangles in Q which is 
maximal and only contains nonspoiled triangles. It is easy to see that for each v exactly 
one of the vertices Vr,Vg, is not in any triangle in T, and that also among v'^,v'g, v'^ there 
is exactly one which is not in any triangle in T. If they were different, in the sense that first 
of them were and the second v'^, for e 7^ e, then a spoiled triangle [t'e,'^^] could be added 
to T what contradicts its maximality. So they are equal, and in a natural way they define 
a color of v. Now we need to prove that the coloring of 7i defined in this way is a legal one. 
But if D{u, v) holds in H then there is spoiled triangle [v^, Uf\ in Q for each e G {r, g, h}. So 
if the colors of v and u were both equal to some e, then we could add this spoiled triangle, 
and T would not be maximal. ■ 

Theorem 8 There exist a database schema, a set IC of integrity constraints consisting of 
key FDs and of an acyclic set of primary foreign key constraints, and a ground atomic query 
$ such that Djc,^ is co-NP-hard. 

Proof. The schema consists of a unary relation P, a binary relation Q{Qi,Q2) and of a 
ternary relation R{Ri, R2, R3). The columns Qi,Ri,R2,R3 are keys, with Qi and Ri being 
the primary keys. The foreign key dependencies are P ^ Qi and Q2 C Ri. For a given 
instance Q of the maximal spoiled-free problem we will construct a database instance r, and 
a query $ such that Q has the maximal spoiled-free property if and only if there is a repair 
r' of r with respect to IC such that $ is not true in r' . 

We define the relation P as a single fact P{a). The relation Q is defined as a set of facts 
{Q{a, s) : s e S}, where S is the set of spoiled vertices from Q. Finally, R is the hypergraph 
from Q. The query $ is P{a). 

The repairs of R with respect to the key dependencies correspond to maximal sets of 
disjoint triangles in Q. If Q has the maximal spoiled-free property then there exists a repair 
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of R which does not contain any tuple of the form R{s, u, v) with s G S. But then the only 
way to repair Q is it take the empty relation, and, consequently, the only way to repair P is 
to take the empty relation. So if Q has the maximal spoiled-free property then $ indeed is 
not true in all repairs. For the other direction notice that if if each repair of it is a tuple 
of the form R{s, u, v) with s £ S then each repair of Q is nonempty and in consequence 
each repair of P consists of the single atom P{a), so then <I> is indeed true in all repairs. ■ 

4.3 Relaxing acyclicity 

We show here that relaxing the acyclicity assumption in Theorem |7| leads to the intractability 
of the repair checking problem (and thus also the problem of consistent query answers), even 
though alternative restrictions on the integrity constraints are imposed. 

4.3.1 One FD, one IND 

Theorem 9 There exist a database schema and a set IC of integrity constraints, consisting 
of one FD and one IND, such that Bjc is co-NP-hard. 

Proof. We will check here whether the empty set is a repair. The database schema consists 
of one relation R{Ai, A2, A3, A^) and the constraints in IC are Ai — > A2 and ^3 C A/^. 

Consider a propositional formula $ = A 02 A . . . (pm, where (pi are clauses. Let r$ 
consist of the facts R{pj,0,4>i,(l)i^i) such that pj occurs negatively in cpi and of the facts 
R{pj,l, (pi, (pi^i) such that pj occurs positively in cpi where the addition i + 1 is meant 
modulo the number m of clauses in We want to show that is a repair of r<j, with 
respect to IC if and only if $ is not satisfiable. 

For the only if direction notice that if p is a satisfying assignment of ^ then the subset 
of r$ consisting of all the facts of the form R{p, p{p),(pi,(pi^i) is a repair, and obviously 
is not a repair then. 

For the opposite direction first notice that a repair r' of r<j> which is nonempty contains 
some fact of the form R(_, _,(pi,(pi^i). So, by inclusion A3 C A4 it must also contain some 
fact of the form _, (pi-i, (pi). By induction we show that 

(*) for every clause (pj from $ there is a fact of the form R{_, _, (pj, (pj+i) in r'. 

Now we make use of the functional dependency Ai A2. If r' is a repair of r$ then for 
each variable p there are either only facts of the form R{p, 0, _, _) in r' or only facts of the 
form R{p, 1, -, -). Define the assignment p{p) as 1 if there is some fact of the form R{p, 1, -, -) 
in r' and as otherwise. It follows from the construction of r$ that if a clause of the form 
_, (pj, (pj+i) is in r' then p satisfies (pj. Together with (*) this completes the proof. ■ 

4.3.2 Key FDs and foreign key constraints 

Theorem 10 There exist a database schema and a set IC of integrity constraints, consist- 
ing of key FDs and foreign key constraints, such that Bjc is co-NP-hard. 

Proof. Again we consider checking whether the empty set is a repair. The schema consists 
of 10 binary relations: R{A, B), Rij{Aij, Bij) with 1 < i,j < 3. For each pair (i,j) both 
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the key dependencies Aij Bij and Bij — > Aij are in IC, with Aij as the primary key of 
the respective relation. The relation R is constrained by a single key dependency B ^ A. 
The inclusion constraints are Bij C B, for each pair i,j and A C Aij, also for each pair 

Consider a propositional formula <I> = 0i A 02 A . . . 0m > where 0j are clauses. We assume 
that none of the clauses in $ contains more than 3 literals, that each variable occurs at most 
3 times in and that the number of variables in $ is equal to the number m of clauses in 
the formula. It is easy to prove that satisfiability is NP-complete even for formulae of this 
kind. For the formula $ we built a database instance r<j>: in the the relation R we remember 
the formula <I>: it consists of such pairs {w, (j)) that w \s a, literal, is a clause from $ and 
w occurs in (j). The definitions of the relations are a little bit more complicated. The 
relation Rij consists of 2m tuples {pi,4's{i,j,i))y a^id i^Pi,4's{i,j,i)): with s still to be defined, 
will be a function from {1, 2, 3} x {1, 2, 3} x {1, 2, . . . m} to {1, 2, . . . m} and, more precisely, 
it is going to be a permutation of {1, 2, . . . m} for every fixed pair (i, j). Define s{i,j, I) as 
n if pi (or ^pi) occurs in the clause (pn+i (where addition is modulo the number of clauses 
m), if pi is the ith variable in this clause, and if it is jth occurrence of pi in Now, for 
each let s(i,j, _) be any permutation consistent with the above definition. It follows 
directly from our construction that: 

Lemma 5 For each clause (pn from ^ and for each variable p occurring in (pn there is a 
relation Rij such that the tuples (p, and {^p,(j)n-i) o-re in Rij. 

We want to show that is a repair of r$ with respect to IC if and only if $ is not 
satisfiable. 

The only if direction is simple. Assume that $ is satisfiable and let p he a satisfying 
assignment. In each tuple in each of the relations R, Rij in r$ the first argument is always a 
literal. Let r' be a subset of r$ consisting of such facts R{w, (j)) or Rij(w, 0) that p{w) = 1. 
The key constraints for Rij are satisfied in r'. The inclusion constraints Bij C B are 
satisfied because, since p was an assignment satisfying = {cpi, cj}2, ■ ■ ■ (pm}- Also the 

inclusions A C Aij hold. But the key dependency B ^ A does not need to hold in r' (this 
is because there is possibly more than one literal w in some clause such that p{w) = 1). 
To construct a nonempty repair of r$ take now r" built with the same relations Rij as r' 
and with relation R being the result of selecting from the relation R in r' exactly one tuple 
{w, (j)) for each 4>. 

The i/ direction is more complicated. If r' is a repair of r$ then, in each of the relations 
for each clause s{i,j,l) at most one of the tuples {pi,4's{i,j,i)) and {^Pi, (j)s{i,j,i)) can be 
in Rij- This implies that at most one of the literals pi, ^pi can be in Aij. But A C Aij 
and, since $ is not satisfiable, there must be a clause 0/ such that none of the literals from 
(pi is in A. This means that (pi is not in B. Consider the clause 4>i+i- By Lemma ^ for each 
variable p from cpi^i there is a relation Rij such that the tuples {p,(pi) and (^p,(pi) are in 
Rij in r$. But, by the inclusion constraints, each of the Bij should be a subset of B, so 
since (pi is not in B in r' it is also not in any of the Bij in r'. While removing (pi from Bij 
we also delete the variables occurring in a tuple of Rij together with (pi. This means that 
for each variable p from the clause (pi+i there is a relation Rij such that neither p nor ^p is 
in Aij. But A is a subset of each of the Aij. This means that none of the literals from (piJ^i 



18 



can be in A. So cpi+i cannot be in B\ Now, using this argument m times we can remove all 
the tuples from the relations, thus proving that r' is empty. ■ 



4.4 Arbitrary FDs and INDs 

Theorem 11 The repair checking problem for arbitrary FDs and INDs is co-NP-complete. 

Proof. Co-NP-hardness was established earlier in this section. The membership in co-NP 
follows from the definition of repair. ■ 

Theorem 12 The consistent query answers problem for arbitrary FDs and INDs is J]^- 
complete. 

Proof. The membership in Ilg follows from the definition of consistent query answer. We 
show n2-hardness below. Consider a quantified boolean formula <j) of the form 

ypi,P2,...pk^qi,q2,---qi 

where ip is quantifier-free and equals to A -02 A . . .tpm, where ipi are clauses. We will 
construct a database instance r^, over a schema with a single relation R{A, B, C, D), such 
that R{a,a,ipi,a) is a consistent answer if and only if ({> is true. The integrity constraints 
will he B and CCD. 

There are 3 kinds of tuples in r^. For each occurence of a literal in ip we have one tuple 
of the first kind (we adopt the convention that ipm+i is V'l)- 

• Ripi, 1, V'j+i) if Pi occurs positively in ipj, 

• R{qi, 1, ipjjipj+i) if qi occurs positively in tpj, 

• 0, V'j, V'i+i) if Pi occurs negatively in Tpj, 

• R{qi,0,ijjj,jl)j+i) if qi occurs negatively in ^j. 

For each universally quantified variable pi we have two tuples of the second kind: 
R{pi, 1, ttj, Oj) and R{pi, 0, Oj, aj). Finally, there is just one tuple of the third kind: R{a, a, ipi, a). 

Let us first show that if is false then R{a,a,ipi,a) is not a consistent answer. Let a 
be such a valuation of the variables Pi,P2, ■ ■ ■ Pk that the formula cr{(p) (with free variables 
qi, q2, ■ ■ ■ Qi is not satisfiable. It will be enough to show that the set s^- of all the tuples from 

which are of the form R{pi, cr{pi), ai,ai) is a repair. The set S(j is consistent. So if it is 
not a repair then another consistent subset s D of r^ must exist. Due to the FD s does 
not contain any tuple of the second kind not being already in s^- So, there must be some 
tuple of the first or the third kind in s. But that means (due to the IND) that for each ■i/'j 
there is either some tuple of the form R{pi, a{pi),tl)j,il)j+i) in s , or some tuple of the form 
R{pi,ei,ijjj,ijjj+i), where G {0, 1}. Due to the FD, for each q^ there can be at most one 
such Ei. Define d'{qi) = Si. Then a{a{<p)) = 1 which is impossible. 

For the opposite direction suppose that (p is true but R{a,a,tp\,a) is not a consistent 
answer. The last means that there exists a repair s of such that no tuple of the form 
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_, tpi, _) can be found in s. But this implies that there are no tuples of the first kind in 
s, and so s only consists of some tuples of the second kind. Due to the FD there exists a 
valuation a such that s consists of all the tuples of the second kind which are of the form 
R{pi, cr{pi), ai,ai). Since (p is true, there exists a valuation a of variables qi,q2, ■ ■ - Qi such 
that a{a{(j))) = 1. But then the set s' consisting of all the tuples from s, R{a, a, tpi, a), 
and all the tuples of the first kind which are either of the form R{pi,a{pi),'ipj,'ipj+i) or 
R{qi, (T{qi) , ipj , tpj^i) is consistent, which contradicts the assumption that s is a repair. ■ 



5 Related work 



We only briefly survey the related work here. A more comprehensive discussion can be 
found in |ABC99| , pCU^ . 

There are several similarities between our approach to consistency handling and those 
followed by the belief revision/update community |GR95|. Database repairs (Definition | 



coincide with revised models defined by Winslett in | Win88 |. The treatment in [|Win8' 



is mainly propositional, but a preliminary extension to first order knowledge bases can be 
found in ^W94| . Those papers concentrate on the computation of the models of the revised 
theory, i.e., the repairs in our case. Comparing our framework with that of belief revision, 
we have an empty domain theory, one model: the database instance, and a revision by a set 
of ICs. The revision of a database instance by the ICs produces new database instances, the 
repairs of the original database. The complexity of belief revision (and the related prob- 
lem of counterfactual inference which corresponds to our computation of consistent query 
answers) in the propositional case was exhaustively classified by Eiter and Gottlob |EG92|. 
Among the constraint classes considered in the current paper, only denial constraints can 
be represented propositionally by grounding. However, such grounding results in an un- 
bounded update formula, which prevents the transfer of any of the PTIME upper bounds 
from [ EG92| ] into our framework. Similarly, their lower bounds require different kinds of 
formulas from those that we use. 

The need to accommodate violations of functional dependencies is one of the main 
motivations for considering disjunctive databases [[NV91, vdM98| and has led to various 



proposals in the context of data integration [|AKWS95| , |BKMS92| , pun96| , |LM96| ]. There 
seems to be an intriguing connection between relation repairs w.r.t. FDs and databases with 
disjunctive information [ vdM98f| . For example, the set of repairs of the relation Person from 
Example |^ can be represented as a disjunctive database D consisting of the formulas 

Person (Brown, Amherst, 115 Klein) V Person (Brown, Amherst, 120 Maple) 

and 

Person(Green, Clarence, 4000 Transit). 

Each repair corresponds to a minimal model of D and vice versa. We conjecture that the 
set of all repairs of an instance w.r.t. a set of FDs can be represented as a disjunctive table 
(with rows that are disjunctions of atoms with the same relation symbol). The relationship 



in the other direction does not hold, as shown by the folowing example [ABC^03| 
Example 7 The set of minimal models of the formula 

(p(ai,6i) Vp(a2,62)) Ap(a3,63) 
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cannot be represented as a set of repairs of any set of FDs. □ 
Known tractable classes of first-order queries over disjunctive databases typically involve 



conjunctive queries and databases with restricted OR-objects |INV91, IvdMV95]. In some 
cases, like in Example ^, the set of all repairs can be represented as a table with OR-objects. 
But in general this is not the case |ABC^O^. 



Example 8 Consider the following set of FDs F = {A — > B^A — > C}, which is in BCNF. 
The set of all repairs of the instance {(oi, 61, ci), (oi, 62, C2)} cannot he represented as a 
table with OR-objects. □ 

The relationship in the other direction, from tables with OR-objects to sets of repairs, also 
does not hold. 

Example 9 Consider the following table with OR-objects: 



OR(a,b) 


c 


a 


OR(c,d) 



R does not represent the set of all repairs of any instance under any set of FDs 



□ 



In general, a correspondence between sets of repairs and tables with OR-objects holds only 
in the very restricted case when the relation is binary, say R{A,B), and there is one FD 
A ^ B. The paper |tvdMV95| contains a complete classification of the complexity of 
conjunctive queries for tables with OR-objects. It is shown how the complexity depends on 
whether the tables satisfy various schema-level criteria, governing the allowed occurrences 
of OR-objects. Since there is no exact correspondence between tables with OR-objects 
and sets of repairs of a given database instance, the results of | IvdMV95 | do not directly 
translate to our framework, and vice versa. 

There are several proposals for language constructs specifying nondeterministic queries 
that are related to our approach {witness | AIIV95 |, choice [ |GGSZ9'i^ , GP9S, GSZ95[ ). Es- 
sentially, the idea is to construct a maximal subset of a given relation that satisfies a 
given set of functional dependencies. Since there is usually more than one such subset, the 
approach yields nondeterministic queries in a natural way. Clearly, maximal consistent sub- 
sets (choice models | GGSZ97 |) correspond to repairs. Datalog with choice [ GGSZ97 | is, in 
a sense, more general than our approach, since it combines enforcing functional dependen- 
cies with inference using Datalog rules. Answering queries in all choice models (VG-queries 
llGSZ95|| ) corresponds to our notion of computation of consistent query answers (Definition 
P). However, the former problem is shown to be co-NP-complete and no tractable cases are 
identified. One of the sources of complexity in this case is the presence of Datalog rules, 
absent from our approach. Moreover, the procedure proposed in | GSZ95|| runs in exponen- 
tial time if there are exponentially many repairs, as in Example |^. Also, only conjunctions 
of literals are considered as queries in [GSZ95]. 

A purely proof-theoretic notion of consistent query answer comes from Bry [Bry97|. 
This notion, described only in the propositional case, corresponds to evaluating queries 
after all the tuples involved in inconsistencies have been eliminated. The paper | ABC99| ] 
introduced the notions of repair and consistent query answer used in the current research. 
It proposed computing consistent query answers through query transformation. The papers 
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|ABC01, ABC^03 studied the computation of consistent query answers in the context of 
FDs and scalar aggregation queries. 

Wijsen [Wij03] studied the problem of consistent query answering in the context of 
universal constraints. In contrast to Definition ^, he considers repairs obtained by modifying 
individual tuple components. Notice that a modification of a tuple component cannot be 
necessarily simulated as a deletion followed by an insertion, because this might not be 
minimal under set inclusion. Wijsen proposes to represent all the repairs of an instance 
using a single trustable tableau. From this tableau, answers to conjunctive queries can 
be efficiently obtained. It is not clear, however, what is the computational complexity of 
constructing the tableau, or even whether the tableau is always of polynomial size. 

Representing repairs as stable models of logic programs with disjunction and classical 
negation has been proposed in [ ABCOC , GGZ01 |. Those papers consider computing consis- 
tent answers to first-order queries. While the approach is very general, no tractable cases 
beyond those already implicit in the results of [ ABC99f| are identified. The semantics of ref- 
erential integrity actions are captured using stable models of logic programs with negation 
in [|LML97|] . 

It is interesting to contrast our results in Section ^ with the classical results about the 
implication problem for FDs and INDs [ AIIV95[| . This problem is undecidable in general 
but becomes decidable under suitable restrictions on INDs. For instance, it is decidable in 
PTIME if the INDs are unary and in EXPTIME if the INDs are acyclic. The problems 
discussed in our paper are all in 112 (Section ^). The role the syntactic restrictions play 
in this context is different. The restriction to unary INDs is not helpful, c.f.. Theorem |ll|. 
The restriction to acyclic INDs makes the repair checking problem tractable (Theorem |^ 
but not so the problem of consistent query answers (Theorem |8|) . 

In |MR92(| , several classes of FDs and INDs were identified for which the implication 
problem does not exhibit any interaction between the FDs and the INDs. I.e., a set of 
constraints implies an FD (resp. an IND) iff the FDs (resp. the INDs) in this set imply 
it. Unfortunately, the syntactic restrictions on constraints that guarantee no interaction 
in the above sense do not play a similar role in our context. It seems that the notion of 
maximality present in the repair definition forces a relationship between the FDs and the 
INDs that is much tighter than the one implicit in the implication problem. 

In [MM9C, MR92|, it is investigated what kind of relational schemas and integrity con- 
straints can result from mapping an Entity-Relationship schema (this is a common way of 
designing relational schemas). Acyclicity of INDs is a necessary requirement, thus repair 
checking is tractable in this case. However, it turns out that the schema from Theorem ^ 
could result from such a mapping. Thus, even restricting the relational schemas to those 
that correspond to Entity-Relationship schemas does not guarantee the tractability of con- 
sistent query answers. 



6 Conclusions and future work 

In this paper we have investigated the computational complexity issues involved in minimal- 
change integrity maintenance using tuple deletions, in the presence of denial constraints 
and inclusion dependencies. We have identified several tractable cases and shown that 
generalizing them leads to intractability. 
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We envision several possible directions for future work. First, one can consider various 
preference orderings on repairs. Such orderings are often natural and may lead to further 
tractable cases. Some preliminary work in this direction is reported in |GGZ01|. Second, a 
natural scenario for applying the results developed in this paper is query rewriting in the 
presence of distributed data sources pGLOq , |Hal01| , |Len02|] . Recent work in this area has 



started to address the issues involved in data sources being inconsistent |BCCG02, LLR02|. 



Finally, as XML is playing an increased role in data integration [PV99, LPVOC, DHW01|, 
it would be interesting and challenging to develop the appropriate notions of repair and 
consistent query answer in the context of XML databases. Recent integrity constraint 
proposals for XML include ||BDF+01|, pSOH, |FKS01|]. 
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